o 

(N 
C 

oo 



> 

On 



X 

5-H 



A realistic distributed storage system that 
minimizes data storage and repair 
bandwidth. 



Bernat Gaston, Jaume Pujol, and Merce Villanueva 

Department of Information and Communications Engineering 
Universitat Autonoma de Barcelona 
Cerdanyola del Valles (Barcelona), Spain 

{Bernat . Gaston | Jaume. Pujol | Merce . Villanueva }@uab.cat 



Abstract 

In a realistic distributed storage environment, storage nodes are usually placed in racks, a 
metallic support designed to accommodate electronic equipment. It is known that the commu- 
nication (bandwidth) cost between nodes within a rack is much lower than the communication 
(bandwidth) cost between nodes within different racks. 

In this paper, a new model, where the storage nodes are placed in two racks, is proposed 
and analyzed. In this model, the storage nodes have different repair costs to repair a node 
depending on the rack where they are placed. A threshold function, which minimizes the 
amount of stored data per node and the bandwidth needed to regenerate a failed node, is 
shown. This threshold function generalizes the threshold function from previous distributed 
storage models. The tradeoff curve obtained from this threshold function is compared with the 
ones obtained from the previous models, and it is shown that this new model outperforms the 
l/^ | previous ones in terms of repair cost. 



O ■ I. Introduction 

m ■ 

In a distributed storage environment, where the data is placed in nodes connected 
through a network, it is likely that one of these nodes fails. It is known that the use 
of erasure coding improves the fault tolerance and minimizes the amount of stored data 
HL ED- Moreover, the use of regenerating codes not only makes the most of the erasure 
coding improvements, but also minimizes the amount of data needed to regenerate a 
failed node Q. 

In realistic distributed storage environments for example a storage cloud, the data is 
placed in storage devices which are connected through a network. These storage devices 
are usually organized in a rack, a metallic support designed to accommodate electronic 
equipment. The communication (bandwidth) cost between nodes within a rack is much 
lower than the communication (bandwidth) cost between nodes within different racks. 

In 0, an optimal tradeoff between the amount of stored data per node and the repair 
bandwidth needed to regenerate a failed node (repair bandwidth) in a distributed storage 
environment was claimed. This tradeoff was proved by using the mincut on information 
flow graphs, and it can be represented as a curve, where the two extremal points of 
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Figure 1: Information flow graph corresponding to a [4,2,3] regenerating code. 



the curve are called the Minimum Storage Regenerating (MSR) point and the Minimum 
Bandwidth Regenerating (MBR) point. 

In H|, another model, where there is a static classification of "cheap bandwidth" and 
"expensive bandwidth" storage nodes, was introduced. However, this classification is not 
based on racks, because the nodes in the expensive set are always expensive in terms of 
repair cost, regardless of the failed node. 

This paper is organized as follows. In Section HH we analyze previous distributed 
storage models. In Section Unl we provide a new model, where the storage nodes are 
placed in two racks. We also provide a general threshold function and we specify the 
MBR and MSR points in this model. In Section HVl we analyze the results of this new 
model compared to the previous ones. Finally, in Section |Vj we expose the conclusions 
of this study. 

II. Previous models 

In this section, we will describe the previous distributed storage models: the basic 
model and the static cost model introduced in Q and flU, respectively. 

A. Basic model 

In [0, Dimakis et al. introduced a first distributed storage model, where there is the 
same repair cost between any two storage nodes. Moreover, the fundamental tradeoff 
between the amount of stored data per node and the repair bandwidth was given from 
analyzing the mincut of an information flow graph. 

Let C be a [n, k, d] regenerating code composed by n storage nodes, each one storing 
a data units, and such that any k of these n storage nodes contain enough information 
to recover the file. In order to be able to recover a file of size M, it is necessary that 
ak > M. When one node fails, d of the remaining n — 1 storage nodes send (3 data units 
to the new node which will replace the failed one. The new node is called newcomer, 
and the set of nodes sending data to the newcomer are called helper nodes. The total 
amount of bandwidth used per node regeneration is 7 = d(3. 

Let Si, where i = 1, . . . , 00, be the z-th storage node. Let G(V, E) be a weighted graph 
designed to represent the information flow. Then, G is in fact a directed acyclic graph, 
with a set of vertices V and a set of arcs E. The set V is composed by three kinds of 
vertices: 

• Source vertex S: there is only one source vertex in the graph, and it represents the 
file to be stored. 
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• Data collector vertex DC: it represents the user who is allowed to access the data 
in order to reconstruct the file. 

• Storage node vertices v\ n and v l out : each storage node Sj, where i = 1, . . . , oo, is 
represented by one inner vertex v\ n and one outer vertex v % out . Let V s C V be the 
set of all these storage node vertices. 

In general, there is an arc [v, w) G E of weight c from vertex v E V to vertex w e V if 
vertex t> can send c data units to vertex w. 

At the beginning of the life of a distributed storage environment, there is a file to be 
stored in n storage nodes Sj, z = 1, . . . ,n. This means that there is a source vertex S 
with outdegree n connected to vertices v\ n , i = 1, . . . ,n. Since we want to analyze the 
information flow of graph G in terms of a and (3, and these n arcs are not significant 
to find the mincut of G, their weight is set to infinite. Each one of the storage nodes Sj, 
% = 1, . . . ,n, stores a data units. To represent this fact, each vertex v\ n is connected to 
vertex v l out with an arc of weight a. 

When the first storage node fails, the newcomer node s rt+1 connects to d existing 
storage nodes sending, each one of them, (3 data units. So, there is one arc from v l out , 
i — 1, . . . , n, to v with weight (3 if Sj sends (3 data units to s n+ i in the regenerating 
process. The new vertex v™* is also connected to its associated v^ 1 with an arc of 
weight a. This process can be repeated for every failed node. Let the new storage nodes 
(newcomers) be Sj, where j = n + l,...,oo. 

Finally, after some failures, a data collector wants to reconstruct the file. Therefore, a 
vertex DC is also added to the graph. There is one arc from vertex v l out to DC if the data 
collector connects to the storage node Sj. Note that if Si has been replaced by Sj, this 
means that the vertex DC can not connect to v l out , but it can connect to v J out . The vertex 
DC has indegree k and each arc has weight infinite, because they have no relevance in 
finding the mincut of G. 

If the mincut from vertex S to DC achieves mincut(S', DC) > M, it means that 
the data collector can reconstruct the file, since there is enough information flow from 
the source to the data collector. In fact, the data collector can connect to any k nodes, 
so min (mincut (S*, DC)) > M, which is achieved when the data collector connects to k 
storage nodes that have already been replaced by a newcomer fl3]|. From this scenario, the 
mincut is computed and lower bounds on the parameters a and 7 are given. Let a*(d, 7) 
be the threshold function, which is the function that minimizes a. As a > a*(d,j), if 
a*(d, 7) can be achieved a is possible too. 

Figure \T\ illustrates the information flow graph G associated to a [4,2,3] regenerating 
code. Note that mincut (S, DC) = min {3(3, a} + min {2(3, a}. For a general information 
flow graph, mincut(S', DC) > ^i=o mm {(^ — a } — M, which after an optimization 
process leads to 




(1) 



where 



(2k -i - l)i + 2k(d - k + 1) 



2Md 



and g(i) 



(2d-2k + i + l)i 
2d 
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Using the information flow graph G, we can see that there are exactly k points in 
the tradeoff curve, or equivalently, k intervals in the threshold function a*(d, 7), which 
represent the k newcomers. In the mincut equation, the k terms in the summation are 
computed as the minimum between two parameters: the sum of the weights of the arcs 
that we have to cut to isolate the corresponding v\ n from S, and the weight of the arc that 
we have to cut to isolate the corresponding v J out from S. Let the first parameter be called 
the income of the corresponding newcomer Sj. Note that the income of the newcomer Sj 
depends on the previous newcomers. 

B. Static cost model 

In 01, Akhlaghi et al. presented another distributed storage model, where the storage 
nodes V s are partitioned into two sets V 1 and V 2 with different repair bandwidth. Let 
V 1 C V s be the "cheap bandwidth" nodes, where each data unit sent costs C c , and 
V 2 C V s be the "expensive bandwidth" nodes, where each data unit sent costs C e with 
C e > C c . This means that when a newcomer replaces a lost storage node, the cost of 
downloading data from a node in the set V 1 will be lower than the cost of downloading 
the same amount of data from a node in the set V 2 . 

Consider the same situation as in the model described in Subsection III-AI However, 
when a storage node fails, the newcomer node Sj, j = n + 1, . . . , 00, connects to d\ 
existing storage nodes from V 1 sending each one of them f3 c data units to Sj, and to 
d 2 existing storage nodes from V 2 sending each one of them (3 e data units to Sj. Let 
d = di + d 2 be the number of helper nodes. Assume that d, d\, and d 2 are fixed, that is, 
they do not depend on the storage node Sj, j = n + 1, . . . , 00. In terms of the information 
flow graph G, there is one arc from v l out to v{ n of weight f3 c or f3 e , depending on whether 
Si sends (3 C or e data units, respectively, in the regenerating process. This new vertex 
v{ n , is also connected to its associated v J out with an arc of weight a. 

Let the repair cost be Ct = diC c f3 c + d 2 C e f3 e and the repair bandwidth 7 = di(3 c + d 2 (3 e . 
To simplify the model, we can assume, without loss of generality, that (3 C = r(3 e for some 
real number r > 1. This means that we minimize the repair cost Ct by downloading 
more data units from the "cheap bandwidth" set of nodes V 1 than from the "expensive 
bandwidth" set of nodes V 2 . Note that if r is increased, the repair cost is decreased and 
vice-versa. Again it must be satisfied that min(mincut(S', DC)) > M. 

When k < di, the mincut is Y^h=o mm {{di/3 c + d 2 /3 e — i[3 c ), «} > M, and when k > 
d lt h is Xio min {(diPc + d 2 (3 e - i(3 c ), a} + Z)*=d 1+ i min {(dt + d 2 - i)(3 e , a} > M. 
After applying C = r(3 e and an optimization process, the mincut equations leads to the 
threshold function shown in [Q. 

III. Rack model 

In a realistic distributed storage environment, the storage devices are organized in 
racks. In this case, the repair cost between nodes which are in the same rack is much 
lower than between nodes which are in different racks. 

Note the difference of this model compared with the one presented in Subsection III-Bl 
In that model, there is a static classification of the storage nodes between "cheap band- 
width" and "expensive bandwidth" ones. In our new model, this classification depends 
on each newcomer. When a storage node fails and a newcomer enters into the system, 
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nodes from the same rack are in the "cheap bandwidth" set, while nodes in other racks 
are in the "expensive bandwidth" set. In this paper, we analyze the case when there are 
only two racks. Let V\ and V 2 be the sets of n\ and n 2 storage nodes from the first and 
second rack, respectively. 

Consider the same situation as in Subsection III-Bl but now the sets of "cheap band- 
width" and "expensive bandwidth" nodes depend on the specific replaced node. Again, 
we can assume, without loss of generality, that (3 C = r(3 e for some real number r > 1. 
Let the newcomers be the storage nodes Sj, j — n + 1, . . . , 00. Let d = d\ + d 2 be the 
number of helper nodes for any newcomer, where d\ and d 2 are the number of helper 
nodes in the first and second rack, respectively. We can always assume that d\ < d 2 , by 
swapping racks if it is necessary. 

In both models presented in Section HH the repair bandwidth 7 is the same for any 
newcomer. In the rack model, it depends on the rack where the newcomer is placed. Let 
7 1 — Pe(diT + d 2 ) be the repair bandwidth for any newcomer in the first rack with repair 
cost C\ = (3 e (C c diT + C e d 2 ), and let 7 2 = (3 e (d 2 r + d\) be the repair bandwidth for 
any newcomer in the second rack with repair cost C\ = (3 e (C c d 2 r + C e d\). Note that if 
di = d 2 or r = 1, then 7 1 = 7 2 , otherwise 7 1 < 7 2 . To represent a distributed storage 
system, the information flow graph is restricted to 7 > a (3). In the rack model it is a 
necessary condition that 7 1 > a, which means that 7 2 > a. 

Moreover, unlike the models presented in Section [III where it is straightforward to 
establish which is the set of nodes which minimize the mincut, in the rack model, this 
set of nodes may change depending on the parameters k, d\, rii and r. Recall that the 
income of a newcomer Sj, j — n + 1, . . . , 00, is the sum of the weights of the arcs that 
should be cut in order to isolate v\ n from S. Let / be the indexed multiset containing the 
incomes of k newcomers which minimize the mincut. It is easy to see that in the model 
presented in Subsection III-Al / = {(d — i)(3 \ i = 0, . . . , k — 1}, and in the one presented 
in Subsection III-BL / = {((di — i)T + d 2 )/3 e | i = 0, . . . ,min{<ii, k — l}}U{(d 2 — i)/3 e | i = 
1, . . . , min{c?2, k — di — 1}} . 

In order to establish / in the rack model, the set of k newcomers which minimize the 
mincut must be found. First, note that since d 1 < d 2 , the income of the newcomers is 
minimized by replacing first di nodes from the rack with less number of helper nodes, 
which in fact minimizes the mincut. Therefore, the indexed multiset / always contains 
the incomes of a set of d\ newcomers from V±. Define I\ = {((di — i)r + d 2 )(3 e \ i = 
0, . . . , min{(ii, k — 1}} as the indexed multiset where h[i), i — 0, . . . , min{<ii, k — 1}, are 
the incomes of this set of d\ newcomers from V 1 . If k — 1 < d 1 , then I = I\, otherwise 
h C I and k — d\ — 1 more newcomers which minimize the mincut must be found. 

At this point there are two possibilities: either the remaining nodes from V\ are in 
the set of newcomers which minimize the mincut or not. Define I 2 = {d 2 (3 e \ i = 
1, . . . , min{k — d% — 1, rii — d\ — 1}} U {(d 2 — i)r(3 e \ i = 1, . . . , min{d 2 , k — rii}} as the 
indexed multiset where ^[i], i — 0, . . . , k — d\ — 2, are the incomes of a set of k — d 1 — 1 
newcomers, including the remaining n\ — d\ — 1 newcomers from V\ and newcomers 
from V 2 . Note that if n\ — di — 1 > k — d\ — 1, it only contains newcomers from V\. 
Define J 3 = {{d 2 — i)r(3 e \ % — 1, . . . , vam.{d 2 , k — d\ — 1}} as the indexed multiset where 
i = 0, . . . , k — d\ — 2, are the incomes of a set of k — d\ — 1 newcomers from V 2 . 
Note that when i > d 2 in I 2 or J 3 the resulting income is negative, which is not possible. 
In fact, given by the information flow graph, the income for any further newcomer is 
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zero. It can be assumed that d 2 > k — d\ — 1 > k — ni, because the mincut equation does 
not change when d 2 < k — d\ — 1 or d 2 < k — n\. 

Proposition 1. As |/ 2 | = |/ 3 | = k — d x — 1, if Eto*" 2 < Y,tt~ 2 L M> then 
I = hU h; and if T.ti 1 ' 2 > Eto^ L M then I = I X U I 3 . 

Proof: Let J be an indexed multiset containing the incomes of a set of newcomers 
such that / = Ii U J. It can be seen that either J = I 2 or J = I 3 . □ 

By using Proposition [Q if / = I\ U I 2 , the corresponding mincut equation is E!=q _1 
min a} + ESq -1 mm {^['V} > M\ and if / = I\ U J 3 , the equation is Ei=o 1 
min {Ji[z], a} + £to 1 min {/ 3 [i], a} > M. 

In the previous models, described in Section HH the decreasing behavior of the incomes 
included in the mincut equation is used to find the threshold function to minimize the 
parameters a and 7. In the rack model, the incomes in the mincut equations may not 
have a decreasing behavior as the newcomers enter into the system. Therefore, it is not 
possible to find the threshold function as it is done in the previous models. However, we 
give a threshold function for the rack model described in this section, which represents 
the behavior of the mincut equations also for the previous models. Note that the way to 
represent this threshold function can be seen as a generalization, since it also represents 
the behavior for the previous given models. 

Let L be the increasing ordered list of values such that for all i, i = 0, . . . , k — 1, 
I[i]//3 e ^ L and |/| = \L\. Note that any of the information flow graphs representing any 
model from Section [XT] and any of the ones representing the rack model, can be described 
in terms of /, so they can be represented by L. Therefore, once L is found, it is possible 
to find the parameters a and (3 e (and then 7 or 7*, i — 1, 2) using the following threshold 
function. 

Theorem 1. The threshold function a*(d±, d 2 , /3 e ) (which also depends on r and k) is 
the following: 



M 
k ' 



/3 e e[/(0),+oo) 

a*(d 1 ,d 2 ,(3 e ) = { M-g)^ /3 e G [/(,), /(,-!)) (2) 



% = 1, . . . , k — 1, 



subject to 7 1 = (d\T + d 2 )(3 e > a, where 



M 

L[i\{k-%) +g{i) 

It can happen that two values in L are equal, so f(i) = f(i — 1). In this case, we 
consider that the interval [f(i), f(i — 1)) is empty. Note that the threshold function © 
is subject to 7 1 = (d\T + d 2 )(3 e > a. However, 7 1 > a is only satisfied when the 
highest value of I\ divided by (3 e coincides with the highest value of L. By definition, 
max/i = ii[0], so maxL = Ii[0]/j3 e . In terms of the tradeoff curve, this means that 
there is no point in the curve that outperforms the MBR point. In order to achieve that 
7i > ct, it is necessary that f(i) > h [0] — for i = 0, . . . , k — 1. This restriction is 

—p^{k—i)+g{i) 

achieved by removing from L any value L[i) such that L[i] > Ji[0]//3 e , i = 0, . . . , k — 1. 
From now on, we assume that L[\L\ — 1] = Ii[0]//3 e . 



Figure 2: Information flow graph corresponding to the rack model when k > di, with k = 4, di = 1, 

d 2 = 3, and n\ = n 2 = 3. 



When k < d\, the mincut equations and the threshold function © of the rack model 
are exactly the same as the ones shown in [@]| for the model described in Subsection 
HI-Bl Indeed, it can be seen that when k < d\, the rack model and the static cost model 
have the same behavior because I = I\. 

Figure |2] shows the example of an information flow graph corresponding to a regener- 
ating code with k — 4, d\ — 1, d 2 — 3, and n\ = n 2 = 3. Taking for example r = 2, we 
have that h = {5/3 e ,3/3 e }, I 2 = {3/3 e ,4/3 e } and J 3 = {4/5 e , 2/3 e }. By Proposition [Q since 
El=o J 2[*] > El=o J 3W, / = h U h = {5/3 e ,3/3 e ,4/3 e ,2/3 e }, and then L = [2,3,4,5]. 
Applying the corresponding mincut equation to the threshold function ©, we have that 



a*(d 1 ,d 2 ,P e ) = < 
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The threshold function © leads to a tradeoff curve between a and (3 e . Note that, like 
in the static cost model, since there is a different repair bandwidth 71 and 72 for each 
rack, this curve is based on (3 e instead of 71 and 72. 

At the MSR point, the amount of stored data per node is cxmsr — M/k. Moreover, at 
this point, the minimum value of /3 e is /3 e = /(0) = j^,, which leads to 

! (djT + d^M 2 (d 2 r + dQM 

On the other hand, at the MBR point, as f(i) is a decreasing function, the parameter /3 e 
which leads to the minimum repair bandwidths is /3 e = f(\L\ — 1) = jL [|^|_ 1 ]( fc _|^| / +1 ) +g (| j: |_ 1 ) • 
Then, the corresponding amount of stored data per node is oi M br = (k^^^^n^^^MpT) > 
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Figure 3: Left: tradeoff curves between a and 7 for k — 5, d\ = 6, e?2 = 
Right: tradeoff curves between a and /3 e for k = 10, di = 5, c?2 = 6, ni 



6, and M = 1, so fc < di. 
= ri2 = 6, and M = 1, so 



and the repair bandwidths are 



1 (d lT + d 2 )M 

1mbr ~ L[\L\-l)(k-\L\ + l)+g(\L\-l) md 



2 (d 2 r + d x )M 

Imbr 



L[\L\-l](k-\L\ + l)+g(\L\-l) 
IV. Analysis 

In this section, we analyze the results of the new fundamental tradeoff curve shown in 
Section [III] for the rack model. We also compare these results with previous contributions 
of papers Q and flU provided it can be carried out. 

When r = 1, we have that (3 e = (3 C , so 7 = d(3 e . This corresponds to the same case 
as in the fundamental tradeoff curve shown in Subsection III-AL since one can assume 
that e — f3. When r > 1 and k < d\, the rack model coincides with the one presented 
in Subsection III-BI and it uses more repair bandwidth than the one shown in Subsection 
III-AI as it is explained in 01. Figure [3] left shows the tradeoff curves between a and 7 
for the rack model when k < d\ (for different values of r). Note that as r increases, 
both a and 7 also increase, but the repair cost decreases as further we see in this section. 
Moreover, both extremal points for each curve are shown: the MSR point is when a is 
minimum and the MBR point is when 7 is minimum. On the other hand, the case when 
r > 1 and k > d\ is different from the previous models. An example is shown in Figure 
[3] right. Note that as r increases (3 e decreases. 

Despite the repair bandwidths 71 and 72 may increase with r, the repair cost always 
decreases. The rack model has two repair bandwidths, 7 1 and 7 2 , this means that it also 
has two repair costs C\ = /3 e (C c dir + C e d 2 ) and C\ = (3 e (C c d 2 T + C e d\). As we have 
said, the case when r = 1 is exactly the same as the one presented in [3J. In this case, for 
each % — 0, . . . , k — 1, taking 7 = f(i), we have that (3 = f(i)/d. Then, we can say that 
Ci(r = 1) = ^(C^ir + C e d 2 ) and Cf (r = 1) = fJ §-{C c d 2 r + C e d x ). From ®, we 
know that f(i) = (2fc _^ 1) , + 2 2gl +rf2 - fc+ i) - so finall y C t(- = 1) = (^SgU 
and C\{r = 1) = (a _^ 2 t ( ^U) - When r > !' we have that & = /(»'). *° 
CKt > 1) = ^1-^ and C> T {T > 1) = ^^y - 
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Define r\ (t) 



C*l(r>l) Cf,(r>l) 



Ci(r=l) 



t?(t) 



ci(r=i)- 



We know that /3 e = f(i) 



M 



L[i\(k-i)+g(i)> 



SO 



(2k -i- l)i + 2A;(di + rf 2 - A; + 1) 



2d(L\i](k-i)+g(i)) 

is a decreasing function over r for every fixed i. This means that as r increases, the 
repair costs C\ and C|> always decrease. Figure |4] left shows the decreasing behavior of 
C\ and (3 e as r increases. 

When k < di, the static cost model and the rack model have the same behavior. 
However, when k > d\, it can be seen in Figure 0] right that the rack model outperforms 
the static cost model in terms of f3 e and a. Note that the repair cost Ct of the static 
model is equivalent to G\ of the rack model. Fixed d\, di, and r, as j3 e decreases G\ 
does, so we can say that the rack model also outperforms the static cost model in terms 
of repair cost. In @), the authors show that the static cost model outperforms the basic 
model presented in [|3) in terms of repair cost. Therefore, it comes straightforward that 
the rack model also outperforms the basic model in terms of repair cost. 
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Figure 4: Left: chart showing the repair cost of the rack model for M = 1, k = 5, di = 6, efe = 6, 
C c = 1 and C e = 10. The points correspond to the k = 5 values given by f(i), i = 0, . . . , 4. Right: chart 
comparing the rack model presented in this paper with the static cost model presented in []4) for M = 1, 

k = 10, di — 5, e?2 = 6, n\ = ny, = 6 and r = 2. 



V. Conclusions 

In this paper, a new mathematical model for a distributed storage environment is 
presented and analyzed. In this new model, the cost of downloading data units from 
nodes in different racks is introduced. That is, the cost of downloading data units from 
nodes located in the same rack is much lower than the cost of downloading data units 
from nodes located in a different rack. The rack model is an approach to a more realistic 
distributed storage environment like the ones used in companies dedicated to the task of 
storing information over a network. 

The rack model is deeply analyzed in the case that there are two racks. The differences 
between this model and previous models are shown. Due to it is a less simplified model 
compared to the ones presented previously, the rack model introduces more difficulties in 
order to be analyzed. In this paper, we provide a complete analysis of the model including 
some important contributions like the generalization of the process to find the threshold 
function of a distributed storage system. This new threshold function fits in any previous 
model and allows to represent the information flow graphs considering different repair 
costs. 
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We provide the general threshold function and apply it to the model when there are 
two racks. We provide the tradeoff curve between the repair bandwidth and the amount of 
stored data per node and compare it to the ones found in previous models. We also analyze 
the repair cost of this new model, and we conclude that the rack model outperforms 
previous models in terms of repair cost. 
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